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ABSTRACT 

Deteraination of the minium resources required to parae ,» lfirtgudfte 
generated by a given context free grammar is an intriguing and yet unsolved 
problem. It seems plausible that any unambiguous context free grammar 
could be parsed in time proportional to the length, n, of each input '-n i .. 
Early (2) has presented an algorithm which parses "many 11 ^vaimuar* in tl&e 
proportional to n t but requires n- on sc*e. His work is an extension oi 
kuuLh*$ <■'*) algorithm, which leads to a very efficient parse proportional 
to n of deterministic languages. This Memo, presents a different extension 
of Knuth's aethod. Knuth's method falls when more than one alternative 
tiuia he* examined by a push down automaton making a left to rtplit fii.ui of 
the Input string. Early's extension takes all possible altomat Ives 
simultaneously without duplication of effort at any given stop. The 
aethod presented here continues through the string In order cu ^ain 
information which will resolve the conflict in the ensuing right to left 
pass, which is made on the symbols accumulated on the stack of the automaton. 
The algorithm la probably more efficient than Early's on certain grammars; 
It will fall completely on others. The essential Idea may be Intereritlitp 
to those attacking the general problem. 
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I, Introduction 

I will assume that the reader Is familiar with context free langudgrsi 
nnd the notation given in Knuth's paper (&)• 

The need to design computer Languages makes it important to under- 
stand which languages can be easily parsed. No general unambiguous ccotext 
free language parsing algorithm haft boon proposed uhlch duel not require 
tloc at least proportional to n , where the string to be parsed is n symbol* 
long. On the other hand , we have not yet been able to find an unambiguous 
gramnar which could not be parsed in time proportional to n by some 
algorithm. The algorithm below handles a wide class of grammars. 

The handle of a sentential form is defined a5 tin 1 left most strinp 

of characters in the sentential form which equals thr right side of s^mie 

rule. Knuth has treated grammars where every handle can he found by con- 

stdcrlag the characters to its Left, the characters In the handle* and 

some fixed number for all handles, k, of characters to the right of the 

handle* Such a grntanar can be parsed in time proportional to a by a 

deterministic PDA, scanning the string from left to right. Fur example, 

* * * 

Che grassier CI: 

S -* A 
S-.cE 
A ^ oAb 
A .» ab 
B -* aBbb 
B -* abb 

ha? the sentential forms 



I- A 

2. a^b" 

3. a n abb° 

4. cB 

5. c,i n aBbbb 2n 
o, ca gbb b 

where the handle has boen underlined In each case. 

The handle In 6. 1b distinguished from the handle in 3. by the "c" 
at the left end of the string. It is not necessary to look at characters 
to the right of the handle, so k*0, and the grammar Is said to be LR(O) . 
On the other hand , grammar Gl cannot be parsed In this manner fr*»m the 
right, for one must look arbitrarily far ahead to find the "c" In order 
distinguish 6. from 3* By similar arguments we conclude that a grammar 

which generates the language with strings of the form 

n. n,n n 

ca o b a 

tl_ n.2n n , 
abb ad 



caVWd 

n. 2n, 2n n 
abb a 

can be parsed from neither the left nor the right. In this mrmn, we oxtend 

Knuth'a method to handle language* of this ty[u*. 

Knuth thinks of the parser as a finite state machine with a push 

down stack. At each step the parser must either add an input ffyohcil ta its 
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shock nr must reduce the stack by recognizing that tlie symbols Jt thr tup 
of the stack correspond to a handle and replace these symbols on the stack 
by the left side of the corresponding rule. In general, this uill require 
Jnf ornuitlnn about characters to the- left* such as the V In example CI* 
Knuth showed that nil tlie useful Information about character** or reduction!* 
already made t to the left con be represented by one of a finite number of 
states* 

The state is stored on the stack after each symbol which Is added Lo 
the stack* The new state is computed from the old state on the top of the 
stack and the symbol added. When the stack is reduced state symbols for 
the symbols In the handle are thrown away. 

Th* method gets into trouble- when it U not possible to determine a 
unique handle with the allowable information. For example, parking CI from 
the riftht, wc would reach the point nbbb , the handle could be either ab 
or abb. Early (2) allows his parser to explore both possibilities by main* 
tainlng o two-dimensional array Instead of a linear stack. The method 
used hero will be to let the finite state language which FutnmrlUJ what 
had been found thus far become conceptually non-det^rmlnlstlc. We will take 
d path for each of the possibilities which could arls<* if the stock was 
reduced. The stack will not be reduced so that no Information will be lost. 
Without reducing the stack we cannot see what state symbol would have 
ended up on top of the stack after the reduction, so more Information will 
have to be brought along by the non-determf niflt ic finite stute tiutmthiri nltfti 
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language. Even though we conceptually take several path** w often constrain 
the possible parsings for tlie rest of the string to tin* right* Sometimes M 
tin* parser moves farther to tlie right t Input symbol* will be found which 
show that some of the paths connut apply to the- r.crintf. The suck enn be 
reduced whenever all current paths Indicate the same reduce [en. Once the 
right end of the string has been reached the stack is parsed from the 
right using Knuth's algorithm; except that the state symbols from the- left 
to right pass are used to restrict the possibilities which the algorithm 
considers. 

II. Description of the Algorithm 

We follow Knuth in the description, except that production:* o( the 
form A-«£ will not be treated since they add even more complexity. We 
define the algorithm to be LR(kl) and RL(k2) for tlie kl and k2 used below. 

First, let \ l Gt) be the sec of all kl-letter strings p over Ty£#] 
such that 0+ p«, for some a. Next, suppose the p — production of the 
grammar to be parsed has the form A-»X ...X , A state Lp-J^^'tJ 
represents the first j letters of the p — production, O-J.-n and a string 
U which could follow the left side o£ the p — production in some son- 
tent in 1 form. We also introduce a y ftjtua! ptat e » (p p J;a,m,t), Virtual 
states are interpreted the same as states, but the- algorithm uses them 
differently, m is any integer identifying this state or virtual state from 
nil others created during the parse and t is a list of integers identifying 



other states or vitual states* During the translation process m maintain 
:x stocky denoted by 

# k2 Vi s i x 2 s z"- x n s ol 1f r--V , # kl ll > 

The portion to the left of the vertical line consist* alternately of gtflte 
Silts and characters; this is the portion of the string which has been con- 
sidered by the left to right pass. The state sets contain hoth regular 
and virtual states. The steps for the left to right pass are Riven below. 
We start the stack with k2 #'s and then enter the following loop, with 
n*0 and S. = fiO»0; c >1>0J}» At each step* assume the stack content-; itre 

as shown In (1) and 5^> - 

i 
S.ten_l_«. Coropute the "closure" S of S, which is defined recursively 03 the 



omaUest sot satisfying the following equation: 

s " Si i [[q,0:p,m,<ml) j| there exists [p»j;«,ml,t J In S , j<n , or 
there exists Cp »J ;^>ml ,t ) foTractl by the previous application p£ Step 3 f 



j<n p , and X p(J+l) - A q , -nd » in 1^, Ot^,...^] 

(We thus have added to S all productions that mlejit apply in addition; to 
those we n^e already working on.) 

Sten 2w Compute the following sets of kl letter strings: 

Z - {p| there exists |.p.j;0tjin s\ J<ji . 



"k * p(J+l) pn p 'J 



Z ■* (CK I p,n ;CtfJ In S J 0^>rs f che number of productions 



Z 9 Z n ,.-., 2 must all b£ disjoint or the gramme r is not LR{kl) and vlrtu.il 

states must be formed, a) If Y..-.Y.. lies in 2 And not In any 2 . shift 

the stack left: 

k? It I 

f S X l S I-" S n Y ll V 2 -"- Y ia^ 

nnd rename Its contents by letting X - ■ Y., Y. ■" Y-,...» and go 10 

Step 3. b) If Tf,.».Y kl lies in exactly one 2 and not Ln 2, then tho 

characters forming the rt&ht side of production p are en the top ot the 

stack. Chock to sec if state sets S* f |\#<*S_ , contain any states 

n*tn *IJ n-i 

I<I>J;p] E° r which j^n . If this is not so, the stack can be reduced. Let 

rnvn ; the stack now contains > ft a«*X tt »*i| i r *P lnc * thts "ring by A to obtain 

* s O X l S l-" X r S /pl Y r"V ond l " n=r * X n+l " V Now B ° to 5tcp 3 " 
Otherwise, shift the stack left as in a) above and go to Step 3. 



S ti*n 3 . The stack now has the form 



Oil n n n+1 I I k 



■ 



andX n+l°Vj+2>} 






■ 



U[(p.j+l'.a.«»c)| (pTjia.r.t) InS n , J<jy 
and X n+1 = X p(J+1) } 



u{(P'3 +1 i°' m » fc )| (P>J: a ' r »0 « U».J;«ir«t] In S n , j<n , 
and (q.n ;0 ,«,(.. .r. . .)) OT |.q,n ;P ,u,(. . .r. . .) _, In 3^] 



u[(p,Jia.«.t)| (p,J;a,m,t) in s n> j<o , 

and <q»t;p»u,(...m...)) or L*l • £ -P •"»(•••»-• *U Al * S i r H* l<n q- 

How form S , from S . by (a) merflinft all state* [_p ,J ;a,cn»tj, 
LP>Ji°» n > s J **y forming a state Lp*J * a ' m * ^UtJ * nd changing the references to n 
co stat? », and (b) doing the same far all such pairs of victual state*. 
I indicate the merge as a separate step because It Is thus easier to 
explain. If the first character to the right of the bar is "# " , begin the 
right Ho loft pass through the stack, otherwise, go to Strp l. 

The right to left pass is done In Che same mariner as the left to 
right pass with three exceptions. First, no virrual state* are formed; tf 
r In* algorithm reaches a point where a virtual state would bo farmed, Lt 
reports failure. Second, since we are going rtght to left, it is necessary 
to sequence through the characters of the right-hand side of a production 
In the opposite direction. When a new state is formrd by Step 1> the form 
is [q,n .;p t n p (inl)J. Then, in Step 3» J Is decremented iiMMd of Increttented. 






Third, after forming the state S by Stop I, called S in the example, 



n 



S is replaced by S formed as follaws. Let S be the left-to-rlpht state 
set at the top of the input. 



s n - ttp.j;"]| Lp.J;q] is m s n 

and |.P,J-l,p] is in S) 



III- Example 






The 


grammar 








0. 


S -* s# 






L. 


S 4 cBaA 






2. 


S - cAA 






3. 


S -♦ B/iB 






4. 


S -» A3 






5. 


A -* aAb 






6, 


A » ab 






7. 


B -» aBbb 






8. 


D 4 abb 


produces 


Che 


fttrtr 


iga 






I. 


tL_2n rri-l.n 
ci d a b 






2. 


en b a b 






3. 


n.2n n+1. 2n 
aba b 


* 




4* 


n n rx, 2n 
a b a b 






■ 



We take kl-1 and k2=0. 
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The major steps tn parsinR a string of the first type are «hevn on 
the following pages. The MM production is not shown in the since seta. 
Since the stack can expand and contract, the state sots have been ftiven 
two subscripts. The second is the subscript used in Section XI. The ft rat 
counts the number of times that the stack has reached that length. 

In the notation for states or vitual states, the set t has been 
omLtted when it Is empty, IC t is a set of one element , th:it element fs 
shown. 
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Stops to Parse String 1. 



n h 2n «**Vj 



#«ool -VW# 

* a O,D c8 0.1 oS 0.2 
< S O,O cS 0,l aS 0,2"'* S i M.i 



n-1.2n n+l.n 
b a b J 



bWi 



#Vn c Vl oS aS ~ "' h 



0,0 0,1 0,2 



0,n+l 0,n+Zl 



2n-l n+l.n, 
a b j 



! O,O cS O,l aS 0,2-" nS 0,n+l bS 0,n+2-" bS 0,3 n +ll 



.• rt b B # 



bS, 



^0,0^0, l a5 J 2"* aS 0>n+l M1 '0 1 rtf2"' w J 3ii+l at 'O t 3n+2 



, . .bS, 



,aS, 



l »"■>"* 



i O ( O cS O,L aS O,2"-- 0S O 1 n-.l bS O,n + 2"- bS O,3n*l aS O,3n + 2"- aS O.4nfll b « 



k ...aS, 



,bS v 



...bS, 



as. 



* S 0,0 CS 0,l aS 0,2 ,, *""'0,n+l-'0,n+2 w 0,3n+l"'0,3n+2 

* S 0,O cS 0,l 8S 0,2-" aS 0,T,4l bS 0.n + 2 — M 0.W S 0,3»+J 



n- 1 



••• o5 0.',n + l bS .4n-h2l b ' 
■" nS 0.4n AS l^H-ll bn "'* 



# S 0,0 cS 0,L aS 0,2" ,I,S 



,nn bS 0.n+2" • bS 0,3n*l aS 0,3n+2* ' •f S ^ J i/ S J^ 11± l b ^i/ l n+2l b 



n-2 



* S 0,0 cS 0,l aS 0,2"" ,S 0,n + l b8 0, n - + 2"-- bS 0,3 O+ l aS 0,3D + 2 AS l,3n*3 



...bS, 



aS, 



^ S 0,O cS 0,l aS 0,2"" aS 0,n+l bS 0,n+2 



IW 
5 0,A,0* 



n42*** bS > 3n-H 



^0,O cS ) l aS 0,2--- aS I n + l bS 0,n + 2--- bS 0,3a 



| S 0,2 aS 0,l A5 0,0* 



'o.S^O^O.l^O.O* 



* S 0,O cS 0,l aS 0,2--- aS 0,n + l 

* 8 o,o cS o 1 i -s o,a— aS 0,n 

* S 0,0 cS i l aS 0.2-" oS 0.n 

iS o.o cS o,i aS o,2"- aS o,n-i 
^o.o 

^0,0 



ho,2 n+ 2 b -- S 0.3 bS 0,2 nS 0,l AS O,0* 

r S l,2 nt l B5 O,2n b - l fl,3 bS 0,2 9l O,. AS O,o' 
| S I,2n42 fl8 l.2n4l B5 Q.2n b5 0.2n-l b " J n.1 bl 0.2 a1 0.l^Q.0^ 

l!i.,*diAs a V- A " s ""' 



ss 



o.o 1 

0,0' 
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Representative State Sets to Parse String 1. 

S 3 <tl.0;#.l-3 U,0;#,2] I3.0;#»3l [4,0;$,4J [5,0; n ,5,4] t6,0)fl,6,4] 

(7,0;a,7,3] lB,Q;a,8,3|) 

S 0.1 " CU.i!#.li W.iif.21 t5,0; a ,U,2l |6,0; a ,12,2l [7 J 0»a J 13 J lJ 

[B,0;a,l4, 11) 

S fl 2 * C(l.l;#.l) (2,l;#.2)[5,l;a,U.2l I6,l;«,l2,2] [7,110,13,1] 

[8,l;a,l4,19l [5,0;b,19,Ul I6,0;b, 20,11] [7pOib.21.l3) 

[8,0;b,22,13]) 

s - ((l.l; r ,D (2,1;#,2) (5,l;a,15,2) (7,l;a,17,l) (5,1 ;b,23,( IJ.23)) 

°' n+1 (7,l;b,24,(l7,24)) [5,l;b,25,23] [6,1 ;b,26,23) [7,l;b.27 ,24] 

[8,l;b, 28,24] l5,0;b, 29,25] [6,0;b, 30,25] (7,0[ta, 31,27] (8 ,0;b t J2 ,27 ] > 

S n+2 = ( t 1 ' 1 Sr- 1 > <2»L;#.2> <S»l;«,15,2> (7,l;a,l7,l) (5,1 ;b,23,(l5 ,23)) 

(7,l;b,24,(l7,Z4)) [6,2;b, 26,23] I8,2;b,28,241 <5,2;b,35 ,(15 ,23)) 



s 



^ = «1,1;#, l> <2,Ii#.2> (S,lja,15,2) <7,l;a,17,l) (5 ,1 ;b,23,(15,23)) 

(7,l;b,24,(l7,24)) l8,3;b, 28,24] (5,3;b,36,(15,23)) (5 ,2;b,37,(l5,23)) 
C7,2;b, 38, (17,24)) 

S 0,irrt " <^» l: #' I > ( 2 > 1 -#>V <5.1;a,15,2) (7,1; ,17,1) (5 ,1 ;b,23, (15,23)) 

(7,l;b,24,(l7,24)) (5,2;a,42,9) (5,2;b,41,(15,23)) 
(5,3;b,40,(l5,23)) (7,3;b,39,(17,24)) 

- 

S 0,n+5 " K 1 * 1 * 1 ? (2.1;#-2) (5,l; a ,l5,2) (7,l ; a,!7,l) (5,l;b,23,(l5,23)) . 

(7,l;b,24,(l7,24)) (5,3;a,47,9) (5 ,3;b,46,C!5 ,13)) (7,2;a,45,l) 
(7,2;b,44,(l7,24)) (7.4;b.43,(17 ,24)) 



■ 
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13 



8 



0,3n+l 



((1,1;*, 1) (2,1;*,2) (5,l;a,15.2> (5,2;a,50,2) (5,1 ;b, 23, (15,23)) 
(5,2;b,52. (15,23)) (5,3;b,53,(15 ,23)) [5,0; T ,62,63] [6,0;#,63,6l ] 
( l,2;rf,60) (2,2;#,6l) (7,l; a ,17,l) <7,2;a,54,l) (7,3;a,55,l) 
(7,4;a,56,U (7,l;b,24.(17,24) (7,2 t b, 57, (17,24)) (7,3;b,58,(l7,24)) 
(7,4;b,59,(17,24)(5,3;a,51,9)) 



'0,3n+2 



((l,3;*,66) [S,l;f,62,61] [6,1; # ,63,61J [5,0;b,64 ,621 lb ,0;b,65 .621 
(5,0;^,67,661 [6,0;*,68,66] (2,2;#,61)) 



'0,3n+3 

■ 

[ 0,4n+l 



((1,3;#,66) (2,2^,61) (5 ,1;#,62,61) [5,1 ;#,67 ,66] 16,1;* .68,66] 
[5,l;b,64,62] [6,1 ;b,65,62] [5,0;b,69,(67,64)] [6 ,0;b, 70, (67,64)1) 

((1.3;#,66) (2,2;*,61) (5,1;#,62,(61 ,66)) (5,1 ;b,64, (62,64)) 
[5,l;b,69,64] [6,l;b.70,64] [5,0;b,71 ,69] [6,0^,72,60]) 



'0,411+2 



((1,3;#,66) (2,2;*,61) (5,1^,62,(61,66)) (5,1 ;b,64, (62,64)) 
(5,2;b,73,<62,64)) [6,2:5,70,64]) 



r l,4n+l 



(U,3;#,66) (2,2; #> 61) (5 ,1 if ,62 ,(61 ,66)) <5,l;b,64,(62 ,64)) 
(5,2;b,75,(62,64)) [5,2;b,69,64]) 



S l,4n+2 = « l ' 3 ^' 66 > < 2 >2:#> 61 > (5,l;f,62,(6l,66)) (5.l;b,64,(62,64)) 
(5,3;b,76,(62,64)) [5,3;b,69,64] (5,2;b,75,(62 ,64))) . 



'1 ,3n+3 



'0,0 
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0,0 



0,1 



. ; 



= ((l,4;*,66) (2,2;J,61) [5 ,2;#,62.6l]) 

- ([L,5; e ] [2,4;,] [3,4, e ] [4,3; e ] [5,4; e ] [6,3', e ] [7,5; e ] |6,4; e ]> 

- ([l.5; e ]) 

- ([1,4; 6 ]) 






I 









' 
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'0,2 



- ([I,3; e ] [7,5; e ]) 



0,3 


■ 


([7,4; c ]) 


0,2n+2 


- 


([6,2; e ]) 


0,2n+3 


- 


UM; C 1) 


l,2n+l 


= 


([7.2j c ]) 


I,2n+2 


■ 


<£M; e J> 


1.3 


- 


<[l»2; s l) 


Ii4 


= 


(U.i; € l> 



• 
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IV Discussion 

First, note that the non-deterministic language is Indeed finite state. 
Since there are a finite number of rules In the granroer nnd t*ach has a 
finite number of characters on its right side, there are only n finite 
number of states of the form (pjj;C*jm,t) or Ip,J;*J,«,tl which have dUtinct p«J 
and a. Therefore, the set t must bo finite and there are aUo only a 
finite number of possible state sets. 

Since the method of constructing the state seta is quite complex, one 
might think that the algorithm cannot be very fast. This is not so. The 
since sets need to constructed only once, then each state set and string 
of k input charcters determines a new state set. Tims , the important 
thing is the number of state sets* and whether this number can be reduced 

for the grammer In question. 

In the example, a character at the left end of the string to be parsed 
gives information needed to make a reduction at the right end. Only after 
this reduction £9 made can information needed to make further reductions 
at the left end be obtained. The example could be expanded so that it could 

be parsed only if the same type of algorithm made throe passes through 

1 1 

the string, or so that the algorithm would have to make an arbitrarily 
largo number of passes. 

If the algorithm succeeds after some numh^r of passes, it means that 
the information needed to find the correct reductions could be obtained 
without having to unwind the pushdown stack in more than one way. Efficiency 



. . . 
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is lost because the algorithm which makes a complete p-iss Lfl one direction 
and then in the other may have to consider some characters aoucrnl tin**?'. 
Thi3 could perlifipft be omitted If the trouble spots were somehow remoiobviTil 
on the first pass. 

The language with strings of the form an and a b cannot be parsed 
by this algorithm- The a** must be counted against the b's tn both way*. 
On the other hand, the language can still be parsed In time proport ionai 
to n by Early's scheme, Hy algorithm will fall on a pallndrom language 
and Early's will require effort proportional to n . A palindroin language 
can be efficiently parsed both ends toward the middle- 
It should be possible to find more general methods of parsing in time 

proportional to tho string length. 






1 
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